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Abstract 

In several interesting applications one is faced with the problem of simultaneous binary hypothesis 
testing and parameter estimation. Although such joint problems are not infrequent, there exist no sys- 
tematic analysis in the literature that treats them effectively. Existing approaches consider the detection 
and the estimation subproblems separately, applying in each case the corresponding optimum strategy. 
As it turns out the overall scheme is not necessarily optimum since the criteria used for the two parts are 
usually incompatible. In this article we propose a mathematical setup that considers the two problems 
jointly. Specifically we propose a meaningful combination of the Neyman-Pearson and the Bayesian 
criterion and we provide the optimum solution for the joint problem. In the resulting optimum scheme 
the two parts interact with each other, producing detection/estimation structures that are completely novel. 
Notable side-product of our work is the proof that the well known GLR test is finite-sample-size optimum 
under this combined sense. 

Index Terms 

GLRT, Joint detection/estimation. 

I. Introduction 

There exist applications in practice where one must resolve the following problem: decide between 
two hypotheses Ho and Hi and then, depending on the decision, estimate a corresponding set of pa- 
rameters #o or 6\. Characteristic example of a problem that can be formulated under this combined 
detection/estimation framework is target detection and localization by MIMO radar, where one is not 
only interested in the classical radar detection problem (presence/absence of a target) but also in estimating 
its position every time a target is declared present [1], [2]. A second example is retrospective changepoint 
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detection where we are interested in determining whether there is a point in our samples after which the 
statistical behavior of the data has changed and, once it is detected then localize this point of interest [3], 
[4]. Clearly segmentation problems can by formulated as retrospective changepoint detection problems. 

We would like to emphasize that our goal is not to solve the pure detection problem in the presence 
of unknown parameters (for this case the parameter estimation subproblem constitutes only an auxiliary 
step). In our approach the estimation part is a vital goal in the whole setup and of the same importance 
as the detection part. This is clearly the case in the two examples we mentioned before, where the 
localization of the target in the first and of the changepoint in the second, are of the same importance 
as the detection part. Current literature does not treat combined problems systematically and the aim of 
this article is to cover exactly this gap. 

Before introducing in a formal way the combined problem, let us first recall, briefly, the corresponding 
formulation and the available finite-sample-size optimality results for detection and parameter estimation. 
For both problems we assume the existence of a random data vector X € M. N of length N. 

Binary hypothesis testing: We consider the following two hypotheses Hq, Hi for X 



where "~" means "distributed according to" and fi{X\6i), i = 0, 1, are two distinct pdfs with 0, denoting 
a vector of parameters under each hypothesis. Given a realization X of X, one must decide between the 
two hypotheses Ho and Hi. If d € {0, 1} denotes our decision, then under a Neyman-Pearson formulation 
we are interested in the following constrained minimization problem 



where P(-) denotes probability and a € (0, 1) the maximal allowable false alarm level. Optimization is 
performed over all decision strategies that satisfy the constraint. 

Under a finite-sample-size setting, when the two pdfs are completely known, i.e. there are no unknown 
parameters, the optimum test is the celebrated Likelihood Ratio test. If the pdfs have unknown parameters, 
except the very rare case where a uniformly most powerful test can be found, the problem in (2) is not 
well defined and one needs to resort to min-max formulations for which no systematic solution exists. 
In this case it is very common to use the Generalized Likelihood Ratio (GLR) test 



H i: X~fi(X\0i),i 



= 0,1, 



(1) 



minP(d = 0|Hi), subject to P(d = 1|H ) < a, 



(2) 



supple! h{X\6i) ^ 



(3) 
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where <9j denotes some a-priori known set of values for 0j. For the GLR test there is no finite-sample - 
size optimality result. In fact there are counterexamples against this claim [5], [6], [7]. Nevertheless 
the use of the GLR test is widespread in applications and one important sideproduct of our analysis 
is the demonstration that this popular detection scheme is in fact finite-sample-size optimum under the 
combined detection and estimation formulation we are proposing here. We would like to stress that this 
is no direct contradiction with the counterexamples reported in [5], [6], [7] since in these references the 
GLR test is evaluated as a pure detector and not in the combined sense we are proposing in this article. 

Regarding the problem in (2), if we assume that the parameters 0j are random with known prior pdfs 
Ki(9i), i = 0,1, then again (2) has a well defined solution which is the likelihood ratio test between the 
two marginal pdfs f t (X) = J fi(X\Oi)*i{Pi)dBi. 

Parameter Estimation : In this problem, we assume that X has a pdf f(X\9) where 6, as before, 
denotes a vector of parameters. If X is a realization of X, the goal is to use the data X in order to 
provide an estimate 9 for 9. Under a finite-sample-size setup, optimum estimation structures are available 
for the Bayesian formulation and only when 9 is assumed to be random with a known prior pdf tt(9). 
Specifically, if C(9,9) denotes the cost of providing the estimate 9 when the true parameter value is 9, 
then the optimum estimator that minimizes the average cost is 

# = arginf J C{U,9)f{X\9)^{9)d9. (4) 

With proper choice of the cost function C(9,9), this formula gives rise to a number of well known 
estimators as the MAR the conditional mean or the conditional median. 

Next we will combine the two problems and after defining a meaningful performance measure we will 
develop the optimum detection/estimation structure for the joint problem. 

II. Combined Detection and Estimation 

As we realize from the previous discussion, in both problems, finite-sample-size optimum solutions 
exist only if we assume that the parameters are random with some known prior. It is therefore natural 
to expect that the same assumption will be transferred to the more general combined problem. With this 
observation in mind, let us define the problem of interest. 

Consider a random data vector X <G M. N and the following two hypotheses Ho, Hi: 

H, : X ~ fi(X\9i) with prior pdf vr^), i = 0, 1. (5) 
Given any realization X of X we would like to decide between the two hypotheses Hq, Hi; and if our 



November 25, 2009 



DRAFT 



4 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2009 (REVISED) 



decision is in favor of H«, then we would like to provide an estimate §i for the corresponding parameters 
9i. 

The priors 7Tj(0j) are considered to be generalized functions containing possible point masses. This 
will allow for the unified analysis of the problem with 9i taking a continuum or a discrete set of values. 
Let us now define what we mean by combined detection/estimation scheme. 

A. Combined Detection/Estimation Structure 

We adopt the class of randomized detectors and estimators, and we propose the following two-step 
scheme: In the first step with the help of two randomization probabilities 8q{X),8\(X) we decide between 
Ho, Hi. Quantity 5{(X) denotes the probability by which we decide d = i using a random game. Clearly 
5q(X) + Si(X) = 1. In the second step we provide parameter estimates that we generate with the help of 
randomized estimators. Specifically we define two conditional pdfs qo(6o\X) and qi(9\\X), that satisfy 
J qo(6o\X) d6o = f q±(9i\X) d9\ = 1. These two density functions are applied as follows: if in the first 
step we decide d = i, then in the second step we use the pdf qi{6i\X) to generate a random variable 
6i distributed according to qi{9i\X). This variable constitutes our estimate. Randomized estimators are 
the direct analog of randomized tests used in hypothesis testing and are not uncommon in Bayesian 
approaches, as one can verify by consulting [8, page 65]. 

We should note that qi{9i\X) must have the same support as the prior 7Tj(#j) since we expect our 
estimate 9i to assume the same values as the true parameter 9, L . This is particularly important if 9i can 
take only a finite number of values, in which case 7Tj(#j) and qi(9i\X) will be comprised of point masses. 
In the latter case, it is easy to see, that we can carry out the analysis using only probabilities instead of 
pdfs and replace integrals over 9{ and 9i with sums. 

Summarizing: the combined detection/estimation structure is comprized of the two probabilities 5o(X), 
5\(X) (used in the first step to distinguish between the two hypotheses Ho, Hi) and of the two pdfs 
qo(9o\X), qi(9i\X) (used to provide the necessary parameter estimate in the second step). We denote 
the complete detection/estimation structure as V = {5q(X), Si(X), qo(9o\X), qi{9\\X)}. 

Remark 1: One might wonder if the adoption of a two-step procedure covers all possibilities for 
a randomized detector/estimator. It turns out that we could also use one-step detectors/estimators that 
simultaneously detect and estimate. However, it is straightforward to show that such schemes can be 
simulated by properly selected two-step procedures; furthermore, the opposite is also true, that is, any 
two-step detector/estimator can be simulated by a proper one-step procedure. Consequently the two 
approaches are fully equivalent and, without loss of generality, we may limit ourselves to the two-step 
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schemes introduced above 1 . 

In the next subsection our aim is to to define a suitable performance measure for V and a corresponding 
optimization problem that will lead to the identification of the optimum detection/estimation structure. 

B. Combined Optimization Problem 

As we mentioned in the Introduction, we are going to combine the Bayesian with the Neyman-Pearson 
approach. To this end let Cji(9j,9i) denote the cost of deciding in favor of hypothesis Hj in the first step 
and providing the estimate Oj in the second step, when the true hypothesis is Hj and the true parameter 



Let us consider the average cost %(D) given that the true hypothesis is Hj. We can express %(D) in 
terms of the complete detection/estimation structure as follows 



Vi(V) = / { 5q(X) / q o 0o\X)@oi0o,X)d6o + S ± {X) / q 1 {9 1 \X)^ li {9 1 ,X)d9 l \ dX, (6) 



where 3>ji(U,X) = f Cji(U,9i)fi(X\9i)TVi(6i)d0i. As we can see the four functions S>ji{U,X) depend 
on the known cost functions Cji(U, di) and on prior information, consequently they are also known and 
independent from the detection/estimation structure V. 

We can now define the following optimization problem that we propose as an alternative to the classical 
problem depicted in (2). 



Level a constitutes the maximally allowable cost under hypothesis Ho- As we can see by direct comparison 
with (2), we follow a Neyman-Pearson like approach, having replaced the (conditional) error probabilities 
of the classical approach with the conditional Bayesian costs. The problem defined in (7) makes a lot of 
sense. Indeed if one is interested in parameter estimation under each hypothesis then the primal concern 
is the induced average estimation cost, which quantifies the quality of the corresponding estimate. It is 
therefore understandable that both, the detection and the estimation subproblems must contribute towards 
the optimization of the same figure of merit. 

Before continuing with the general solution of our problem, we would like to consider a special case 
which establishes finite-sample-size optimality for the GLR test. The practical significance of this popular 
test certainly justifies this special analysis. There is however an additional reason that makes this short 
parenthesis necessary: we plan to use the GLR test as our prototype, therefore we will observe under 

'Our claim is particularly easy to prove when the parameters 6i take only a finite number of values. 



is 0j. 




inf ^i(P), subject to %{V) < a. 



(7) 
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what conditions we can guarantee its optimality. Then we will apply similar assumptions in the general 
case, in order to generate GLR-like tests that are compatible with various well known cost functions used 
in applications. This will produce novel tests that are hopefully more suitable than the classical GLR 
test, for these problems. 

C. Optimality of the GLR Test 

Consider the case where 9{ takes a finite set of values. Without loss of generality, we will assume that 
6i = 1, 2, . . . , Li and for simplicity, when 9i = I, we are going to denote the corresponding pdf as fu(X) 
instead of fi(X\9i = I). This immediately suggests that the two prior pdfs TTi(6i) will be comprised 
of an equivalent number of point masses. We denote the corresponding prior probabilities with iru. In 
other words under hypothesis Hj we have X ~ fu{X) with prior probability ttu, where i = 0, 1 and 
/ = 1, . . . ,Lj. Since 6>; assumes a finite number of values, the estimators qi(9i\X) will be comprised 
of point masses as well. Let qu{X) denote the corresponding probabilities. Our detection/estimation 
structure can then be identified as the following collection of probabilities 

V = {S (X), 5 1 (X), q 01 (X),..., q 0Lo (X), q u (X),..., q 1Ll (X)} (8) 

with the following properties 

U 

5i(X) > 0; q u (X) > 0; 6 (X) + S^X) = £ q u {X) = 1. (9) 

1=1 

As before the probabilities 5q(X), Si(X) are used in the first step to decide between the two main 
hypotheses. Given that the decision in the first step is in favor of Hj, we go to the second step and with 
the help of the probabilities qu(X), I = 1, . . . , Lj, we decide with the help of a randomized test among 
the possibilities fa(X), fn^X). 

Consider now the following special case of cost functions 

Cio(0i,0o) = Coi{9q,6i) = 1; Cn(9,9) = Cqo{9,9) = t^ e y, (10) 

where 1^ denotes the indicator of the set A. In other words the cost is only when both steps make 
the correct selection and it is equal to 1 otherwise. The corresponding average cost ^(V) is then equal 
to the probability of detection/estimation-error under hypothesis Hj. We have the following theorem that 
solves the problem defined in (7). 

Theorem 1. Consider the class J a of all detection/estimation strategies that satisfy the constraint 

P(Detection/estimation-error\Ho) < a, (11) 
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where a m i n < a < I, with 

amin = 1 - / max {7t 0l f i(X)}dX. (12) 

The test, within the class J a , that minimizes the probability P(Detection/estimation-error\Hi) is given 
by: 

Step 1: The optimum strategy for deciding between the two main hypotheses Ho and Hi is 

max UnfniX)} Hi 

"\ f(yn 1 A (13) 
max {ttoj/oiPO} < 

iS'S^o Ho 

where, whenever the left hand side coincides with the threshold we perform a randomization between 
the two hypotheses and select Hi with probability 7. 

Step 2: If in Step 1 we decide in favor of hypothesis Hj then the optimum estimation strategy is 

= arg m^iiTjifjiiX)}. (14) 

If more than one indexes attain the same maximum we perform an arbitrary randomization among them. 

The threshold A and the randomization probability 7 of Step 1 must be selected so that the constraint 
in (11) is satisfied with equality. 

Proof: We observe that P(Detection/estimation-error|Hj) = 1 — P(Correct-detection/estimation|Hj), 
therefore the constraint is equivalent to P (Correct-detection/estimation | Ho) > 1 — a. If we denote the 
possibility {X ~ fu(X)} with H^ then we can write 

Li 

P (Correct-detection/estimation I Hj) = P (Correct-detection/estimation | H u ) ttu (15) 

1=1 

with 

P(Correct-detection/estimation|Hji) = J 5i{X)qu(X)fu(X)dX. (16) 

Instead of minimizing the probability of detection/estimation-error we can equivalently maximize the 
probability of correct-detection/estimation. To solve the constrained optimization problem, let A > be a 
Lagrange multiplier and, as in the classical Neyman-Pearson case, with the help of (15) and (16), define 
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the corresponding unconstrained version. We then note 

P(Correct-detection/estimation|Hi) + A P (Correct-detection/estimation Hq) 



= j S 1 (X) f^quiX^ufuiX)^ dX + X J S (X) S^q ol (X)7r 0l foi(X)^ dX (17) 
< / <5i(X) max {irufu(X)} dX + A / 5 (X) max { m f Ql {X)}dX (18) 

J l<l<Li J 1<1<L 

-I 



S^X) max {7T ll f ll (X)} + S (X)X max {n ol f Ql (X)} 



dX (19) 



< J maxj^max {TTufu(X)} , A^max {n if i(X)}^ dX. (20) 

Inequality (18) is valid because the functions qu(X), I = 1, . . . ,Lj are nonnegative and complementary 
(their sum is equal to 1). Inequality (20) is also true because the same properties hold for 5i(X), i = 0, 1. 
Note that the final expression constitutes an upper bound on the performance of any detection/estimation 
rule. Furthermore this upper bound is attainable by a specific detection/estimation strategy. Indeed we 
note that we have equality in (18) when the estimation probabilities are selected as 



qik(x) 



1 if k = aigmini<i< L .{irufii(X)} 

yZl) 

otherwise, 



and we randomize if there are more than one indexes attaining the same maximum. This optimum 
estimation process is the randomized equivalent of (14). Similarly we have equality in (20) when we 
select the detection probabilities to be 



5 1 (X) = { 



1 if max!<K Ll {Trufu(X)} > Amaxi</<x, {mki{X)} 

7 if maxi<K Ll {7ri//i;(X)} = Amaxi<K Lo {7r i/oK^)} ( 22 ) 
otherwise, 



and 5q(X) = 1 — Si(X). Clearly this optimum detection procedure is the equivalent of (13). 
As far as the false alarm constraint is concerned let us define the following sets 

A(X) = [x : maX1 ^^ > A 

[ maxi<K Lo {iToiioi{X)\ 

g(A) = \x ; ggigg* { ; ii { u{ Z\] = A 

[ maxi<K Lo {ir ol f i(X)} 



(23) 
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For the test introduced above, we can then write that 




A(X)UB(X) 



max {ir if i(X)} dX = a min . 



max 

KKLi 



'0 



max {7r if i(X)} dX 



{7T //oz(X)}dX-7 



B{\) 



max {7roz/oz(^)}dX 



(24) 



The lower bound a m i n is clearly attainable in the limit by selecting 7 = 1 and letting A — > 0. Also the 
detection/estimation-error probability is bounded from above by 1 and we can see that this value can 
also be attained in the limit by selecting 7 = and letting A — > 00. Existence of a suitable threshold A 
and a randomization probability 7 that assure validity of the false alarm constraint with equality, as well 
as, optimality of the resulting test in the desired sense, can be easily demonstrated following exactly the 
same steps as in the classical Neyman-Pearson case 2 . This concludes the proof. ■ 
We realize that in order to apply the test in (13) we need knowledge of the prior probabilities iru. 
Whenever this information is not available we can consider equiprobable subcases and select iru = 
Under this assumption the optimum test in (13) is reduced to the familiar form of the GLR test, 



after absorbing the two prior probabilities inside the threshold. 

Finally, we should mention that if hypothesis Ho is simple or, if under hypothesis Ho we are not 
interested in the estimation problem (therefore we can treat it as simple by forming the marginal density) 
then P (Detection/estimation-error I Ho) becomes the usual false alarm probability with corresponding 
"min = 0. In other words the false alarm probability can take any value in the interval (0, 1) as in 
the classical Neyman-Pearson problem. 

Remark 2: We observe that the optimum test, under each main hypothesis, selects the most appropriate 
subcase with the help of the MAP selection rule (14). The interesting point is that this selection is 
performed independently of the other hypothesis and of the corresponding detection strategy. This is 
clearly a very desirable characteristic since it separates the estimation from the detection problem. In our 

2 In the proof we simply replace the pdfs fi(X) with the functions maxi<;<_L i {TTufu(X)}. Even though these functions are 
not densities, the proof goes through without change. 



max fu(X) Hi 




A 



(25) 
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analysis we are going to provide sufficient conditions that can guarantee the same property under the 
general formulation. 

Remark 3: We obtain the GLR test by assuming that the prior probabilities are uniform. We will use the 
same principle in our general formulation to obtain tests that can be used as alternatives to the classical 
GLR test. 

III. Optimum Detection/Estimation Scheme 

Let us now continue with the solution of the optimization problem defined in (7). We have the following 
theorem that provides the desired optimal detection/estimation structure. 

Theorem 2. Consider the class J a of detection/estimation structures T> that satisfy ^(D) < a. The test 
that minimizes the average cost (V) within the class J a is given by 

Hi 

ini[% 1 {U,X) + \% {U,X)] | mf[@ 11 {U,X) + X@ w (U,X)} (26) 

H 

with the optimum estimators defined by 

9j = arginf [%([/, X) + X@ j0 (U, X)}, j = 0, 1, (27) 
and A > a threshold properly selected to satisfy the corresponding constraint with equality. 

Proof: Let A > be a Lagrange multiplier and consider the unconstraint minimization of the 
combination ^i(£>) + X^oiV). Using (6) we can write 

= f {S (X) f qo0 o \X)[9 ol {O o ,X) + \9 oo o ,X)]d0 o 

r , (28) 

+5 1 (x) I qi {e 1 \x)[^ n (9 1 ,x) + x^ w (9 1 ,x)]de 1 jdx 

> J [s {X)mi[^ 01 (U,X) + X% (U,X)}+S 1 (X)mi[^ 11 (U,X) + X9 10 (U,X)]yX (29) 

> y"min{inf[0 O i(tf,*) + X^Qo(U,X)],i^[^ n (U,X) + X9 10 {U,X)]}dX. (30) 
The inequality in (29) is true because 



q^X^^X) + X9 iO (ei,X)]d0i > mf[9 a (U,X) + X9 i0 (U,X)\ \ %(^|X)^ 

= mi[9 il (U,X) + X^ i0 (U,X)} 



(31) 



DRAFT 



November 25, 2009 



MOUSTAKIDES: FINITE SAMPLE SIZE OPTIMALITY OF GLR TESTS 



11 



with equality iff qi{9i\X) puts all its probability mass on the choice B, L = arginf u[@n(U, X)+\%q(U, X)\, 
which is thereby optimum. Similarly we have that (30) is true because 5o(X) +Si(X) = 1, and we have 
equality iff 

II if miu[® 01 {U,X) + \% (U,X)] > miu[&n(U,X) + \® 10 (U,X)] 
7 if miu[@ 01 {U,X) + \% (U,X)] = miui^ni^X) + \® 10 (U,X)] (32) 
if miu[9 01 (U,X) + \% (U,X)] < wf u [® 11 (U,X) + \® 10 (U,X)], 
with < 7 < 1 and 5q(X) = 1 — S\(X). This is the randomized version of (26). This completes the 
proof. ■ 
Remark 4: For the level a we have a m i n < a < a max . It is possible to come up with an expression 
for a m - m . Indeed, from (6) it is easy to see that 

«b(Z>) > j |^o(^) inf ^oo (U, X) + <5i(X) inf @ W (U, X)| dX (33) 
> y"min|inf^oo(^^),mf^i (f/,X)|dX = Q min . (34) 

This lower bound is in fact attainable by the optimum scheme defined with (26), (27), if we let A — > 0. 
Unfortunately a similar expression for the upper bound a max was not possible to obtain. 

Remark 5: As we can see from (26), (27) the optimal solutions for the detection and estimation 
subproblems are interrelated. If we are interested in the same characteristic we encountered in the GLR 
test, where the two estimation problems are independent from each other and from the detection part, 
then the following special form of the cost functions can assure the validity of this property 

C O i(l7,0i) = C O i(0i) and C 10 (U,6 ) = C 10 (9 ). (35) 

Indeed we can see that if (35) is true then @ 01 (U,X) = %\{X) and @ 10 (U,X) = S> W {X), which 
implies that the optimum estimators in (27) simplify to 

<? = ar g inf[^oipO + \% {U,X)\ = arginf % {U,X) (36) 

01 = arginf^^C/,^) + \® W {X)] = arginf @u(U,X), (37) 

that is, they coincide with the classical Bayesian estimators which we obtain by treating each estimation 
problem separately. The optimum detector in (26), under the same assumptions takes the form 



^oi W- inf^n ([/,*) | A 



Hi 

(38) 



Ho 

which of course relies on the optimum cost values. 



0io P0 -mi$> 00 (U,X) 
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Remark 6: Observing (26) and (27) it seems as if the order of the two steps in our two-step procedure 
has been reversed. This impression however is not exactly correct. We note that the minimum of a 
function is unique and it is the two minimal values that are used in (26). The actual estimates that realize 
the two minima, and are depicted in (27), are not necessarily unique and therefore we might require 
randomization which is performed in the second step. But even if the two estimators are deterministic, it 
is the first step that will dictate which of the two values will be used as our actual parameter estimate. 
And this selection is performed after the detection step. Therefore, strictly speaking, the order is not 
reversed. 

A. Special Case 

We would like now to pay attention to a particular case that is common in applications. Consider 
under Hi that X ~ fi(X\6) where 9 a parameter vector with known prior tt(6) and under Ho we assume 
that X ~ fo{X). In other words the pdf under Ho is completely known. In fact it is very common to 
have fo(X) = fi(X\9 = 0). Our goal is to test Ho against Hi, and whenever we decide in favor of 
Hi to provide an estimate 9 for the corresponding parameter vector 9. We should mention that the two 
application problems discussed in the Introduction, fall under this particular class. 

Since parameter estimation is needed only under Hi, this suggests that a combined detection/estimation 
structure will be comprised of the following functions V = {Sq(X), 5\(X), qi(9\X)} that satisfy 5j(X) > 

0, j = 0, 1, qi(9\X) > 0, S (X) + <5i(X) = / qi (9\X)d9 = 1. The two probabilities 6 {X),5 1 (X) will 
be used in the first step to decide between the two main hypotheses, while q\{9\X) will be employed in 
the second step to provide the necessary estimate for 9, every time a decision in favor of Hi is reached. 

Regarding the estimation costs we have the following functions Cn(6>, 9), C\q{9), Coi(6>) and Coo- As 
we can see Coo is simply a constant, whereas Cio(-) and Coi(-) are functions of a single quantity. Consider 
now the following selection Coo = and C\q{9) = 1, then it is easy to verify that ^(D) = P(d = l|Ho), 

1. e. the probability of false alarm. For this particular selection we have the following interesting corollary 
of Theorem 2. 

Corollary 1. Consider the average cost ^i(P) under Hi defined using the two cost functions Cn(9,9) 
and Cqi(9). The optimum detection/estimation structure that minimizes ^i(T>) under the constraint that 
the false alarm probability P(d = 1 Ho) is no larger than a £ (0, 1), is given by 

MX) < ' (39) 
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for the optimum detector and 



9 1 = W gini® 11 {U,X) 



(40) 



for the corresponding optimum estimator. The two functions @u(U,X), Q}q\{X) are defined as follows 



B. Discussion 

In finite-sample-size optimum detection and estimation the need for the prior pdfs constitutes a 
very severe weakness. As we mentioned earlier, if this information is not available the corresponding 
optimization problems must be treated in some min-max context. Unfortunately min-max formulations 
tend to be very difficult to solve even asymptotically, and no systematic solution exists for the problems 
of detection and estimation. It is of course clear that the same limitation applies in the case of the more 
general combined detection/estimation problem. 

A simple (ad-hoc) method to bypass the need for resorting to min-max approaches, is to apply the 
same idea used to demonstrate optimality for the GLR test, namely assume that the priors are uniform. 
Of course this selection is arbitrary and does not guarantee optimality of the corresponding scheme under 
any possible min-max sense. On the other hand, it is the only logical choice that reflects our complete 
lack of knowledge about the priors. The corresponding tests, examples of which will be seen in the next 
section, it is expected to have the same weakness as the GLR test, with one major difference: they will 
be tailored to the specific cost function adopted in the estimation subproblem. 



In this section we present a number of interesting examples by selecting various well known forms 
of cost functions. We basically concentrate on the popular costs encountered in the classical Bayesian 
estimation theory. We start with the MAP estimate which demonstrates optimality of the GLR test in the 
continuous case. 

A. MAP Detection/Estimation 

Consider the following combination of cost functions 




(41) 



IV. Examples 



C 01 (U,9) = C W {U,9) = 1; Cao(U,0) = C u (U,e) = 



||£/-0||<A<1 



(42) 



1 otherwise. 
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We recall from the classical Bayesian estimation theory (see [9, Page 145]) that, as A — > and assuming 
sufficient smoothness of the pdf functions, the specific selection of costs leads to the MAP parameter 
estimation under each main hypothesis. Indeed we observe 3 

®jj{U,X) « J fjiXlOfaWM - fjiXlUfaWVjiA) (43) 

where Vj(A) is the volume of a hypersphere of radius A (which can be different for each hypothesis if 
the two parameter vectors are not of the same length). Substituting in (38) yields 

S n Pu MXIU^U) ^ V (A) = , 

S u Pu f (U\X)ir (U) < Vi(A) ' 1 ; 

Ho 

and the optimum estimator under each hypothesis is the MAP estimator 

Oj = aigsupf j {X\U)ir j {U). (45) 
u 



Similarly for the special case of Corollary 1 if we define 

c u (u,e) = < 



' 1117-011 < A< 1 



(46) 



1 otherwise, 

and C O i(0) = 1, then @ U (U,X) » / f 1 (X\9)Tr{9)d9 - /i(X|[/)7r(C/)Fi(A) and the optimum test in 
(39) takes the form 

fo(X) < Vl (A) ' (4/) 

Ho 

with the optimum estimator being 6 = argsup^ f(X\U)ir(U). In both tests (44) and (47), the threshold 
A' and the corresponding randomization probability 7 are selected to satisfy the false alarm constraint 
with equality. If the prior probabilities 7Tj(#j), 7r(#) are unknown and are replaced with the uniform over 
some prior sets Q{ we obtain the classical form of the GLR test depicted in (3). 

B. MMSE Detection/Estimation 

Let us now develop the first test that can be used as an alternative to the GLR test. Consider the 
following costs 

Coi(£/,0i) = Coi(0i); C w {U,e ) = C w {e ); C 00 (U,e) = C U (U,9) = \\U - 6\\ 2 , (48) 

3 The approximate equality becomes exact as A — > 0. 
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where Coi(#i), Cio(#i) are functions to be specified in the sequel. Due to the specific form of the costs, 
the two estimators are independent from each other and also independent from the detection part. Under 
each main hypothesis the optimum estimator is obtained by minimizing the corresponding mean square 
error. Consequently the optimum estimator is the conditional mean of the parameter vector given the data 
vector X (see [9, Page 143]). Specifically we have 

mm - MWj . (49) 

The corresponding optimum test after substituting in (38) takes the form 

Hi 

Ai(X) | XAq(X) (50) 

Ho 

where 

A (X) = ||0 o || 2 /oPO + J [C 10 (0 ) - \\e \\ 2 ]fo(X\6 )Tr (6o)d6 

Ai(X) = ||0i|| 2 /iPO + J [C O i(0i) - \\9 1 \\ 2 }f l (X\9 1 )ir l (9 1 )d9 1 (51) 

fj(X) = J f;,i X» , -.-,«) , )<!(),. 
Selecting Coi(#i) = ||#i|| 2 and C\q{9q) = ||#o|| 2 simplifies the test considerably yielding 



Ijgif h{x) = \\e 1 \\ 2 jf l (x\e 1 )7T 1 (e 1 )de 1 | 
\\0o\\ 2 fo(x) \\e \\^jfo(x\e )7r 1 (9 )de < 

Mo 

We recognize in the second ratio the likelihood that is used to decide optimally between the two main 
hypotheses. By including the first ratio of the two norm square estimates, the test performs simultaneously 
optimum detection and estimation. 

For the special case of Corollary 1 it is easy to verify that the corresponding test takes the form 

jjtfACjp + f[c 01 (e) - jfflACgjgMg) de i 

MX) < A ' (53) 

Ho 

which, if we select Cqi(6) = \\9\\ 2 , simplifies to 

Pi < A ' <54) 

Ho 

where 8 = E[0\X, Hi] = f 9f(X\8)ir(9)d9/ f f(X\9)n(8)d8 and h{X) = J f(X\9)n(9)d9. 

In both tests in (50) and (53), if the priors are not known and are replaced by uniforms, we obtain 
tests that are the equivalent of the GLR test for the MMSE criterion. 
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C. Median Detection/Estimation 

As our final example we present the case of the median estimation where 9^,9^,9,11 are scalars and 
we select the cost functions as follows 



C 0l (U,9) = C m {9); C W (U,9) = C 10 (9); C 0O (U,9) = C 11 (U,9) = \U-e\ 



(55) 



The estimators are again independent from each other and from detection part. Under each hypothesis 
we perform optimum Bayes estimation and for this specific cost function we know that the optimum 
estimator is the conditional median [9, Page 143] 



The optimum test, as before, becomes 



Hi 

Ai(X) | XAo(X) 



(56) 



(57) 



where 



A (X) = J [C w (9 ) + 9 sgn(9 - 9 )\ fo{0 o \X)ir o (0 o )dB o 
Ai(X) = [ \c i(9 1 )+9 1 sgn(9 1 -9 1 )} f^X)^^ 



(58) 



If additionally we select Coi(#i) = |^i| and Cio(^o) = l^ol tnen tne optimum test takes the more 
convenient form 



Jtf 1 9 1 f 1 (X\9 1 )7r 1 (9 1 )d9 1 ^ 
J°° 9oMX\0 Q )M0o)Mo So 



A. 



(59) 



For the special case of Corollary 1 and for Cqi{9) = \9\, the corresponding optimum test reduces to 



f 0f{X\9)K{9)d9 
fo(X) 



Hi 

> 
< 
Ho 



(60) 



while the optimum estimator is 9 = arg{y : P(9 < y\X, Hi) = 0.5}. Finally when the priors are selected 
to be uniform, we obtain a test that is the alternative to the GLR test but tuned to the specific Bayesian 
criterion we employ in the estimation part. 
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V. Application to Retrospective Changepoint Detection 



Perhaps the most appropriate application where one would readily need to replace the GLR test with 
an alternative scheme, is the problem of target detection and localization. Clearly for this problem the 
most suitable cost function is the mean square error between the location estimate and the true position. 
This choice will inevitably lead to the use of tests that are similar to (53), proposing a completely novel 
approach for this intriguing problem. Unfortunately the corresponding derivations are lengthy and thus 
impossible to detail here. In the limiting space we have to our disposal it is feasible to treat, with our 
preceding methodology, the second application we mentioned in the Introduction, namely the retrospective 
changepoint detection problem. We would like to mention that even though in this problem the estimation 
costs are MAP-like, suggesting use of the GLR test, as we will see, there is sufficient simplicity and 
originality in our results that make our analysis interesting and worth including in this article. 

In its simplest form, retrospective changepoint detection is about an observation vector X € R N and 
two pdfs foo(X) and fo{X) which are completely known. If X = [xi, ■ ■ ■ ,XnY then we assume that 
there is an unknown point r such that the samples {xi, • • • ,Xt} follow the nominal measure foo{X) 
while the {xt+i, ■ ■ ■ ,Xn} switch to the alternative fo{X). Consequently, the changepoint r is the last 
point where the samples follow the nominal regime 4 . 

We are interested in deciding whether the change took place within or before the given collection of 
samples, that is r < N, or the change will take place at some future point (possibly at infinity), that is 
r > N. In the former case we would also like to obtain an estimate f of the changepoint r. The combined 
detection/estimation version of the retrospective changepoint detection problem, as we mentioned in the 
Introduction, is suitable for formulating segmentation problems. 

Let us first define the joint pdf f T (X) of the samples X given r. We distinguish three sets of values 
for r, namely t < 0, t E {1, . . . , N — 1} and r > N. The first corresponds to a change occurring before 
taking any samples, the second to a change within the available sample set and the third to the change 
occurring after we acquired the samples. The most common model for the induced joint pdf is [10], [11] 



4 Notation seems to be somewhat awkward compared to the usual one used in hypothesis testing. We simply follow the standard 
practice of sequential changepoint detection theory. 



fr(X) = { 



f (X) for r < 

/oo(*r)/o(*&il*r) for < r < iV - 1 
/oopQ for N<t, 



(61) 
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where X = [xi, . . . ,xnY and for a < b we define X h a = [x a , . . . , Xbf- We can see that if the change 
takes place before the samples are acquired, all samples are under the alternative regime. If the change 
takes place within the available set, then the initial portion of the samples follows the pdf of the nominal 
regime while the final portion the conditional pdf of the alternative regime. Finally if the change does 
not occur before or inside the available data set, all samples are under the nominal regime. 

Regarding the changepoint r there are different models. Detailed discussion of the various possibilities 
can be found in [10], [11]. Here we limit ourselves to Shiryaev's popular Bayesian model. Specifically 
we assume that r is a random variable with a prior {w n } defined as wq = P(r < 0), w n = P(r = n) 
for < n < N - 1, w N = P(r > N) and such that Y.n=o w n = 1. 

As we mentioned, the goal is to test {r < N — 1} against {r > N}, and in the former case provide 
and estimate f for r. Formulating the problem according to our previous theory, we have that under 
Ho the samples follow the nominal pdf /^(X) while under Hi we have N different possibilities with 
corresponding pdf f T (X) and prior 7r r = w T / (J2k=o w k) = w t/(1 — ^n), where < r < N. 

Let us now consider the combined detection/estimation problem in the sense of Corollary 1 , namely 
minimize the average cost under Hi subject to a false alarm probability constraint under Ho- We propose 
the following cost functions Cn(f, t) = l{f^ r }, where 1a denotes the indicator function of the set A, 
and Coi(r) = 1. In other words we penalize with 1 the incorrect detection of Hi but also the correct 
detection of Hi followed by an incorrect estimation of r. The average cost is simply the probability of 
detection/estimation-error introduced in Subsection II.C. Applying the results of Corollary 1 and using 
the Bayes rule and (61), the optimum detection/estimation structure is given by 



for the corresponding optimum estimator. If the priors ir n , n = 0, . . . , N — 1, are unknown and we select 
them to be equal, then we obtain the GLR test version of the problem 




(62) 



for the optimum detector and 




(63) 




(64) 



where Sn is known as the CUSUM statistic for point N. 
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The previous result was of course expected since we followed the same formulation as the one used in 
Subsection II.C to prove optimality of the GLR test. Interestingly, our theory allows for the development 
of alternative detection/estimation structures in a simple and straightforward manner. For example one 
might argue that the cost Cn(f, r) = l{f^ r } * s overly stringent and propose as alternative the function 
Cu(t,t) = l{if-T|>m} where < m <C N is a nonnegative integer. In other words we tolerate errors 
in the estimate of r that do not exceed m points. If m = the problem is reduced to the case already 
discussed. Clearly most practical segmentation problems would allow m > 0. 

Again we adopt the setup proposed in Corollary 1 . It is then easy to verify that we obtain the following 
optimum structure 



Efn+k{X) 1 j ^ fo( X n+k+l\ X l +k ) I > , 

^-fZHn = m <£?£_ m 2^ 7Tn+k ~T7YN — ^ (65) 



max 



m<n<JV-m ,^ foc(X) m<n<N-m\ ^ f nC) (X N ,, l AX\ 

Kk=-m Jooy 1 ) ~ Kk=-m J 00 ^ n+k+1 r v l 

for the detector and 




m f ( I vn+k\ ~\ 

r = arg max < > Tr n +k ^— 11 n— } (66) 

B m<n<N- m ) ^ f^(X N , AX? +k )\ 



for the estimator. Finally assuming uniform priors for the case where the probabilities {710, . . . ,7r;v-i} 
are unknown, leads to the test 

( m ft YN I vn+k\ \ Hi 

.Sat = max < > n— > = A, (67) 

m<n<N-m ) ^ f^(X N , AX? +k ^ ' 

which is completely novel and replaces the GLR test in (64), with Sn being clearly different than the 
CUSUM statistic. 




VI. Conclusion 

By introducing a joint detection/estimation formulation that properly combines the Neyman-Pearson 
methodology (for detection) and the Bayesian methodology (for estimation), we derived optimum schemes 
for problems that require simultaneous detection and estimation. Important side-product of our analysis is 
the demonstration that the well known GLR test is finite-sample-size optimum under this joint-problem 
sense. Furthermore we were able to provide completely novel GLR-type tests, that were derived by 
replacing the MAP estimation cost function with other well known choices as the mean square or 
mean absolute estimation error. Finally, we used our proposed methodology to analyze the problem of 
retrospective changepoint detection. This led to the development of a novel detection/estimation structure 
that can replace the CUSUM approach which is obtained when we apply the GLR test. 
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